Coreference Resolution for Morphologically Rich Languages. Adaptation of the Stanford System to Basque

نویسندگان

  • Ander Soraluze
  • Olatz Arregi Uriarte
  • Xabier Arregi
  • Arantza Díaz de Ilarraza
چکیده

This paper presents the adaptation of the Stanford coreference resolution system to Basque, an agglutinative head-final pro-drop language. The adapted system has been integrated into a global linguistic analysis pipeline so that the input of the system are original Basque raw texts linguistically processed, and annotated. We demonstrate that language-specific characteristics have a noteworthy effect on coreference resolution. In the case of agglutinative languages the use of morphosyntactic features improves substantially the system’s performance, obtaining a gain in CoNLL F1 results of 5 points when automatic mentions are used and of 7.87 points when gold mentions are provided.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

Corefrence resolution with deep learning in the Persian Labnguage

Coreference resolution is an advanced issue in natural language processing. Nowadays, due to the extension of social networks, TV channels, news agencies, the Internet, etc. in human life, reading all the contents, analyzing them, and finding a relation between them require time and cost. In the present era, text analysis is performed using various natural language processing techniques, one ...

متن کامل

Coreference Resolution for the Basque Language with BART

In this paper we present our work on Coreference Resolution in Basque, a unique language which poses interesting challenges for the problem of coreference. We explain how we extend the coreference resolution toolkit, BART, in order to enable it to process Basque. Then we run four different experiments showing both a significant improvement by extending a baseline feature set and the effect of c...

متن کامل

Mention detection: First steps in the development of a Basque coreference resolution system

This paper presents the first steps in the development of a Basque coreference resolution system. We propose a mention detector system based on a linguistic study of the nature of mentions. The system identifies mentions that are potential candidates to be part of coreference chains in Basque written texts. The mention detector is rule-based and has been implemented using finite state technolog...

متن کامل

Coreferential Relations in Basque: The Annotation Process.

In this paper we present the coreferential tagging of part of the EPEC Corpus of Basque. Although coreference is a pragmatic linguistic phenomenon highly dependent on the situational context, it shows some language-specific patterns that vary according to the features of each language. Due to the fact that Basque is not an Indo-European language, it differs considerably in grammar from the lang...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Procesamiento del Lenguaje Natural

دوره 55  شماره 

صفحات  -

تاریخ انتشار 2015